Examining the 2014 Seattle and San Francisco summer crime data, as reported by the police, I noticed that San Francisco, despite having a population of less than 30 percent greater than the population of Seattle, had more than twice as many recorded cases of drug crime. Does this reflect a greater rate of use of certain drugs in San Francisco than in Seattle, or, rather, perhaps a difference in the manner of enforcement of drug laws or the manner in which reported infractions are recorded? Digging somewhat deeper—examining the data from 2013 through 2017 and on the use of certain popular drugs—I uncovered some striking trends and obtained some possible insight into these questions.

Examining the Seattle crime data from 2013 through 2017, one of the first things I noticed was a dramatic increase in the number of incidences of crime reflected in the police data. As the following plot of incidences of crime within some selected categories shows, the fractional increase in incidences of crime in the data is roughly the same across the various categories of crime.

library(tidyverse)
## ── Attaching packages ─────────────────────────────────────────────────────────────────────────────────── tidyverse 1.2.1 ──
## ✔ ggplot2 2.2.1.9000     ✔ purrr   0.2.4     
## ✔ tibble  1.4.2          ✔ dplyr   0.7.4     
## ✔ tidyr   0.7.2          ✔ stringr 1.2.0     
## ✔ readr   1.1.1          ✔ forcats 0.2.0
## ── Conflicts ────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
library(plotly)
## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
library(lubridate)
## 
## Attaching package: 'lubridate'
## The following object is masked from 'package:base':
## 
##     date
SF_complete <- read_csv("SF_Complete.csv")
## Parsed with column specification:
## cols(
##   IncidntNum = col_integer(),
##   Category = col_character(),
##   Descript = col_character(),
##   DayOfWeek = col_character(),
##   Date = col_character(),
##   Time = col_time(format = ""),
##   PdDistrict = col_character(),
##   Resolution = col_character(),
##   Address = col_character(),
##   X = col_double(),
##   Y = col_double(),
##   Location = col_character(),
##   PdId = col_double()
## )
seattle_complete <- read_csv("seattle_complete.csv", guess_max = 100000)
## Parsed with column specification:
## cols(
##   `RMS CDW ID` = col_integer(),
##   `General Offense Number` = col_double(),
##   `Offense Code` = col_character(),
##   `Offense Code Extension` = col_integer(),
##   `Offense Type` = col_character(),
##   `Summary Offense Code` = col_character(),
##   `Summarized Offense Description` = col_character(),
##   `Date Reported` = col_character(),
##   `Occurred Date or Date Range Start` = col_character(),
##   `Occurred Date Range End` = col_character(),
##   `Hundred Block Location` = col_character(),
##   `District/Sector` = col_character(),
##   `Zone/Beat` = col_character(),
##   `Census Tract 2000` = col_character(),
##   Longitude = col_double(),
##   Latitude = col_double(),
##   Location = col_character(),
##   Month = col_integer(),
##   Year = col_integer()
## )
#With guess max large enough, columns that contain integers that exceed the 
#32-bit maximum are read as character vectors.
SF_complete <- SF_complete %>%
  mutate(Year = as.integer(str_sub(Date, -4, -1))) %>%
  mutate(Category = fct_recode(Category,
        "Drugs" = "DRUG/NARCOTIC"))

seattle_complete <- seattle_complete %>%
  rename(Category = `Summarized Offense Description`) %>%
  mutate(Category = fct_recode(Category,
        "Prostitution" = "PROSTITUTION",
        "Drugs" = "NARCOTICS",
        "Weapons" = "WEAPON",
        "Liquor\nlaws" = "LIQUOR VIOLATION",
        "Assault" = "ASSAULT",
        "Homicide" = "HOMICIDE",
        "Robbery" = "ROBBERY",
        "Vehicle\ntheft" = "VEHICLE THEFT",
        "Theft\nfrom\nvehicle" = "CAR PROWL"))

seattle_complete <- seattle_complete %>%
  mutate(R_date = floor_date(mdy_hms(`Occurred Date or Date Range Start`,
                                     tz = "PST8PDT"), "day"))

SF_complete <- SF_complete %>%
  mutate(R_date = mdy(Date, tz = "PST8PDT"))

seattle_complete %>%
  filter(Category %in% 
        c("Prostitution",
        "Drugs",
        "Weapons",
        "Assault",
        "Homicide",
        "Robbery",
        "Vehicle\ntheft",
        "Theft\nfrom\nvehicle")) %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = Category, fill = as.factor(Year))) +
  geom_bar(position = "dodge") +
  ggtitle("Recorded Cases of Selected Categories of Crime: Seattle") +
  theme(plot.title = element_text(hjust = 0.5)) +
  labs(fill = "Year")

San Francisco has seen no such increase. Furthermore, there is no evidence of the massive crime wave in Seattle that this data would suggest, if the increase were taken to reflect the actual increase in crime. For example, there are reports of recent increases of use of some drugs, particularly heroin, but nothing close to the more than 100 percent increase in use, from one year to the next, that the data on drug crime in Seattle might seem to suggest.

Looking somewhat deeper at the recorded cases of illegal drug use in the two cities, let’s consider the differences between crimes that would be considered trafficking and those that would be considered possession.

library(stringr)

seattle_drugs_13_17 <- seattle_complete %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

SF_drugs_13_17 <- SF_complete %>%
  mutate(`Cited or Arrested` = (str_detect(Resolution, "ARREST") | 
                                str_detect(Resolution, "CITED"))) %>%
  filter(Category == "Drugs" & Year %in% c(2013:2017))

SF_drugs_13_17 <- SF_drugs_13_17 %>% 
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(Descript, "LAB APPARATUS"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "POSSESSION"), "Possession",
              ifelse(str_detect(Descript, "SALE"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "TRANSPORT"), "Sales, Transportation, or Production",
              ifelse(str_detect(Descript, "PLANTING/CULTIVATING MARIJUANA"), 
                     "Sales, Transportation, or Production",
              "Other"
              ))))))

seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Offense Type` = 
          ifelse(str_detect(`Offense Type`, "POSSESS"), "Possession",
              ifelse(str_detect(`Offense Type`, "FOUND"), "Possession",
              ifelse(str_detect(`Offense Type`, "DISTRIBUTE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SMUGGLE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "PRODUCE"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "SELL"), 
                     "Sales, Transportation, or Production",
              ifelse(str_detect(`Offense Type`, "TRAFFIC"), 
                     "Sales, Transportation, or Production",
              "Other"))))))))
  

joined_counts <- (seattle_drugs_13_17 %>% count(Year, `Drug Offense Type`)) %>%
  left_join((SF_drugs_13_17 %>% count(Year, `Drug Offense Type`)), by = c("Year", "Drug Offense Type")) %>%
  gather(n.x, n.y, key = City, value = Incidents) %>%
  mutate(City = fct_recode(City,
                           Seattle = "n.x",
                           `San Francisco` = "n.y"))

#reformat some categories that will appear in the plot
joined_counts <- joined_counts %>% 
  mutate(`Drug Offense Type` = as.factor(`Drug Offense Type`)) %>%
  mutate(`Drug Offense Type` = 
           fct_recode(`Drug Offense Type`,
                      " Sales, Transportation,\nor Production" = "Sales, Transportation, or Production",
                      " Possession" = "Possession"))

plot <- ggplot(joined_counts %>% filter(`Drug Offense Type` != "Other")) +
  geom_col(aes(x = Year, y = Incidents, fill = City, color = `Drug Offense Type`),
                        position = "dodge") +
  theme(legend.title = element_blank()) +
  labs(x = "year", y = "count",
       title = "Drug-related Police Incidents: possession versus trafficking")

ggplotly(plot) %>% layout(margin = list(b = 50, l = 60, r = 10, t = 80))
seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(`Offense Type`, "MARIJU"), "Marijuana",    #yes
                            ifelse(str_detect(`Offense Type`, "METH"), "Methamphetamine",    #yes
                            ifelse(str_detect(`Offense Type`, "COCAINE"), "Cocaine",  #yes 
                            ifelse(str_detect(`Offense Type`, "HEROIN"), "Heroin",  #yes
                            ifelse(str_detect(`Offense Type`, "PRESCRIPTION"), "Prescription", #?
                            ifelse(str_detect(`Offense Type`, "PILL/TABLET"), "Pill/Tablet",  #?
                            ifelse(str_detect(`Offense Type`, "HALLUCINOGEN"), "Hallucinogen",
                            ifelse(str_detect(`Offense Type`, "SYNTHETIC"), "Synthetic",  #?
                            ifelse(str_detect(`Offense Type`, "AMPHETAMINE"), "Amphetamine", #yes
                            ifelse(str_detect(`Offense Type`, "OPIUM"), "Opium", #yes
                            ifelse(str_detect(`Offense Type`, "PARAPHENALIA"), "Paraphernalia", #yes
                            "Other")
                            )))))))))))
  
SF_drugs_13_17 <- SF_drugs_13_17 %>%
  mutate(`Drug Type` = ifelse(str_detect(Descript, "MARIJUANA"), "Marijuana",   #yes
                            ifelse(str_detect(Descript, "COCAINE"), "Cocaine",
                            ifelse(str_detect(Descript, "METH-AMPHETAMINE"), "Methamphetamine",
                            ifelse(str_detect(Descript, "BARBITUATES"), "Barbituates",
                            ifelse(str_detect(Descript, "CONTROLLED SUBSTANCE"), 
                                              "Controlled Substance",
                            ifelse(str_detect(Descript, "HALLUCINOGENIC"), "Hallucinogenic",
                            ifelse(str_detect(Descript, "AMPHETAMINE"), "Amphetamine", 
                            ifelse(str_detect(Descript, "METHADONE"), "Methadone",
                            ifelse(str_detect(Descript, "PARAPHERNALIA"), "Paraphernalia",
                            ifelse(str_detect(Descript, "OPIATES"), "Opiates",
                            ifelse(str_detect(Descript, "OPIUM"), "Opium",
                            ifelse(str_detect(Descript, "HEROIN"), "Heroin",
                            "Other")
                            ))))))))))))

#ggplot(SF_drugs_13_17) +
#  geom_bar(aes(`Drug Type`)) +
#  labs(title = "San Francisco") +
#  theme(axis.text.x = element_text(angle = 75, hjust = 1, vjust = 1))

seattle_selected_drugs <- seattle_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

SF_selected_drugs <- SF_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

#ggplot(seattle_selected_drugs) +
#  geom_bar(aes(`Drug Type`)) +
#  labs(title = "Seattle") +
#  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1))

#ggplot(SF_selected_drugs) +
#  geom_bar(aes(`Drug Type`)) +
#  labs(title = "San Francisco") +
#  theme(axis.text.x = element_text(angle = 75, hjust = 1, vjust = 1))

joined_drug_counts <- (seattle_selected_drugs %>% count(Year, `Drug Type`)) %>%
  left_join((SF_selected_drugs %>% count(Year, `Drug Type`)), by = c("Year", "Drug Type"))
  
joined_drug_counts <- joined_drug_counts %>%
  rename(Seattle = n.x, `San Francisco` = n.y) %>%
  gather(Seattle, `San Francisco`, key = City, value = count)
  
plt <- ggplot(joined_drug_counts) + geom_col(aes(x = Year, y = count, fill = City, color = `Drug Type`), 
                                      position = "dodge")

ggplotly(plt) %>% layout(margin = list(b = 50, l = 60, r = 10, t = 80))
library(lubridate)
library(gridExtra)
## 
## Attaching package: 'gridExtra'
## The following object is masked from 'package:dplyr':
## 
##     combine
#fig.height = 2, fig.width = 3

#seattle_drugs_13_17 <- seattle_drugs_13_17 %>%
#  mutate(R_date = floor_date(mdy_hms(`Occurred Date or Date Range Start`,
#                                     tz = "PST8PDT"), "day"))

#SF_drugs_13_17 <- SF_drugs_13_17 %>%
#  mutate(R_date = mdy(Date, tz = "PST8PDT"))

seattle_selected_drugs <- seattle_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

SF_selected_drugs <- SF_drugs_13_17 %>%
  filter(`Drug Type` %in% c("Marijuana", "Cocaine", "Methamphetamine", "Heroin"))

number_of_bins <- 78

plt_seattle_selected_drugs <- ggplot(seattle_selected_drugs) + 
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "Seattle Police Incidents Related to 4 Popular Drugs",
       color = "Drug: ") +
  theme(legend.position = "bottom",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10),
        legend.text = element_text(size = 10),
        legend.title = element_text(size = 12)) +
  guides(size = guide_legend(order = 2))

plt_SF_selected_drugs <- ggplot(SF_selected_drugs) +
  geom_freqpoly(aes(x = R_date, color = `Drug Type`), bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "San Francisco Police Incidents Related to 4 Popular Drugs") +
  theme(legend.position = "none",
        plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

plt_seattle_all_13_17 <- seattle_complete %>% 
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "Seattle Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

plt_SF_all_13_17 <- SF_complete %>%
  filter(Year %in% 2013:2017) %>%
  ggplot(aes(x = R_date)) + geom_freqpoly(bins = number_of_bins) +
  scale_x_datetime(date_breaks = "4 months", 
                   limits = c(as.POSIXct("2012/12/31", tz = "PST8PDT"), 
                              as.POSIXct("2018/01/01", tz = "PST8PDT"))) +
  theme(axis.text.x = element_text(angle = 65, hjust = 1, vjust = 1)) +
  labs(y = "number per fortnight", x = "",
       title = "San Francisco Police Incidents: 2013 through 2017") +
  theme(plot.title = element_text(size = 14),
        axis.title.y = element_text(size = 12),
        axis.text.x = element_text(size = 10))

grid.arrange(plt_seattle_all_13_17, plt_SF_all_13_17, 
             plt_SF_selected_drugs, plt_seattle_selected_drugs,  
             layout_matrix = rbind(c(1, 2),
                                   c(4, 3),
                                   c(4, NA)),
             heights = c(5, 5, 1))
## Warning: Removed 1348 rows containing non-finite values (stat_bin).
## Warning: Removed 3 rows containing missing values (geom_path).

## Warning: Removed 3 rows containing missing values (geom_path).
## Warning: Removed 12 rows containing missing values (geom_path).

## Warning: Removed 12 rows containing missing values (geom_path).